Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
The biggest drawback of AntiViruses (AVs) experiments is label heterogeneity–each AV labels the same samples very distinctly. Whereas AV labeling issues have been well studied from the sample point of view, they have not been widely studied from the AV perspective, i.e., to what extent label diversity allows AV identification. Thus, we question: (1) How unique among all the AVs are the labels produced by the same given AV? (2) Can we fingerprint AVs based on their assigned labels? and (3) How many labels are required to fingerprint an AV? In this work, we answer these questions via experiments with a dataset of 720000 AV-assigned labels for Windows malware spread over 15 years (2006-2020). We discovered that: (1) AVs can be fingerprinted by their assigned labels with 100% accuracy in many cases; (2) AVs can be fingerprinted with a confidence score of 99% using only 1% of the dataset; (3) AV fingerprinting rates vary over time, as the label changes caused by the AV updates have a key effect on AV recognition, causing some AV models to lose their ability to recognize their AV generated labels over time; and (4) Android AVs can be fingerprinted the same way as Windows AVs, but that Linux labels are harder to be grouped. We expect our work might shed light on the label heterogeneity problem, incentivize further developments to mitigate it, and provide future works with data to support their design decisions.more » « lessFree, publicly-accessible full text available December 1, 2026
-
In real life, distinct runs of the same artifact lead to the exploration of different paths, due to either system’s natural randomness or malicious constructions. These variations might completely change execution outcomes (extreme case). Thus, to analyze malware beyond theoretical models, we must consider the execution of multiple paths. The academic literature presents many approaches for multipath analysis (e.g., fuzzing, symbolic, and concolic executions), but it still fails to answerWhat’s the current state of multipath malware tracing?This work aims to answer this question and also to point outWhat developments are still required to make them practical?Thus, we present a literature survey and perform experiments to bridge theory and practice. Our results show that (i) natural variation is frequent; (ii) fuzzing helps to discover more paths; (iii) fuzzing can be guided to increase coverage; (iv) forced execution maximizes path discovery rates; (v) pure symbolic execution is impractical, and (vi) concolic execution is promising but still requires further developments.more » « lessFree, publicly-accessible full text available December 31, 2025
-
Spafford, Eugene (Ed.)Developing and evaluating malware classification pipelines to reflect real-world needs is as vital to protect users as it is hard to achieve. In many cases, the experimental conditions when the approach was developed and the deployment settings mismatch, which causes the solutions not to achieve the desired results. In this work, we explore how unrealistic project and evaluation decisions in the literature are. In particular, we shed light on the problem of label delays, i.e., the assumption that ground-truth labels for classifier retraining are always available when in the real world they take significant time to be produced, which also causes a significant attack opportunity window. In our analyses, among diverse aspects, we address: (1) The use of metrics that do not account for the effect of time; (2) The occurrence of concept drift and ideal assumptions about the amount of drift data a system can handle; and (3) Ideal assumptions about the availability of oracle data for drift detection and the need for relying on pseudo-labels for mitigating drift-related delays. We present experiments based on a newly proposed exposure metric to show that delayed labels due to limited analysis queue sizes impose a significant challenge for detection (e.g., up to a 75% greater attack opportunity in the real world than in the experimental setting) and that pseudo-labels are useful in mitigating the delays (reducing the detection loss to only 30% of the original value).more » « lessFree, publicly-accessible full text available January 1, 2026
-
Free, publicly-accessible full text available April 9, 2026
-
Malware analysis tasks are as fundamental for modern cybersecurity as they are challenging to perform. More than depending on any tool capability, malware analysis tasks depend on human analysts’ abilities, experiences, and practices when using the tools. Academic research has traditionally been focused on producing solutions to overcome malware analysis technical challenges, but are these solutions adopted in practice by malware analysts? Are these solutions useful? If not, how can the academic community improve its practices to foster adoption and cause a greater impact? To answer these questions, we surveyed 21 professional malware analysts working in different companies, from CSIRTs to AV companies, to hear their opinions about existing tools, practices, and the challenges they face in their daily tasks. In 31 questions, we cover a broad range of aspects, from the number of observed malware variants to the use of public sandboxes and the tools the analysts would like to exist to make their lives easier. We aim to bridge the gap between academic developments and malware practices. To do so, on the one hand, we suggest to the analysts the solutions proposed in the literature that could be integrated into their practices. On the other hand, we also point out to the academic community possible future directions to bridge existing development gaps that significantly affect malware analysis practices.more » « less
-
Machine Learning (ML) is a key part of modern malware detection pipelines, but its application is not straightforward. It involves multiple practical challenges that are frequently unaddressed by the literature works. A key challenge is the heterogeneity of scenarios. Antivirus (AV) companies for instance operate under different performance constraints in the backend and in the endpoint, and with a diversity of datasets according to the country they operate in. In this paper, we evaluate the impact of these heterogeneous aspects by developing a classification pipeline for 3 datasets of 10K malware samples each collected by an AV company in the USA, Brazil, and Japan in the same period. We characterize the different requirements for these datasets and we show that a different number of features is required to reach the optimal detection rate in each scenario. We show that a global model combining the three datasets increases the detection of the three individual datasets. We propose using Federated Learning (FL) to build the global model and a distilling process to generate the local versions. We order the samples temporally to show that although retraining on concept drift detection helps recover the detection rate, only a FL approach can increase the detection rate.more » « less
-
Balzarotti, Davide; Xu, Wenyuan (Ed.)On-device ML is increasingly used in different applications. It brings convenience to offline tasks and avoids sending user-private data through the network. On-device ML models are valuable and may suffer from model extraction attacks from different categories. Existing studies lack a deep understanding of on-device ML model security, which creates a gap between research and practice. This paper provides a systematization approach to classify existing model extraction attacks and defenses based on different threat models. We evaluated well known research projects from existing work with real-world ML models, and discussed their reproducibility, computation complexity, and power consumption. We identified the challenges for research projects in wide adoption in practice. We also provided directions for future research in ML model extraction security.more » « less
-
Prof. Ninghui Li Editor in Chief, ACM Transactions (Ed.)Malware analysis is an essential task to understand infection campaigns, the behavior of malicious codes, and possible ways to mitigate threats. Malware analysis also allows better assessment of attacker’s capabilities, techniques, and processes. Although a substantial amount of previous work provided a comprehensive analysis of the international malware ecosystem, research on regionalized, country, and population-specific malware campaigns have been scarce. Moving towards addressing this gap, we conducted a longitudinal (2012-2020) and comprehensive (encompassing an entire population of online banking users) study of MS Windows desktop malware that actually infected Brazilian bank’s users. We found that the Brazilian financial desktop malware has been evolving quickly: it started to make use of a variety of file formats instead of typical PE binaries, relied on native system resources, and abused obfuscation technique to bypass detection mechanisms. Our study on the threats targeting a significant population on the ecosystem of the largest and most populous country in Latin America can provide invaluable insights that may be applied to other countries’ user populations, especially those in the developing world that might face cultural peculiarities similar to Brazil’s. With this evaluation, we expect to motivate the security community/industry to seriously considering a deeper level of customization during the development of next generation anti-malware solutions, as well as to raise awareness towards regionalized and targeted Internet threats.more » « less
-
null (Ed.)A promising avenue for improving the effectiveness of behavioral-based malware detectors is to leverage two-phase detection mechanisms. Existing problem in two-phase detection is that after the first phase produces borderline decision, suspicious behaviors are not well contained before the second phase completes. This paper improves CHAMELEON, a framework to realize the uncertain environment. CHAMELEON offers two environments: standard–for software identified as benign by the first phase, and uncertain–for software received borderline classification from the first phase. The uncertain environment adds obstacles to software execution through random perturbations applied probabilistically. We introduce a dynamic perturbation threshold that can target malware disproportionately more than benign software. We analyzed the effects of the uncertain environment by manually studying 113 software and 100 malware, and found that 92% malware and 10% benign software disrupted during execution. The results were then corroborated by an extended dataset (5,679 Linux malware samples) on a newer system. Finally, a careful inspection of the benign software crashes revealed some software bugs, highlighting CHAMELEON's potential as a practical complementary antimalware solution.more » « less
An official website of the United States government

Full Text Available